Lab: Configuration

Setting Up the Environment

To setup the environment please see the README.md file here. You may need to pip install Numpy and Gym depending on your configuration as well.

Investigating Some Code

Once you have the environment setup, look at the code for the humanoid agent. The agents listed in the roboschool/agent_zoo directory have all been pre-trained to be able to achieve locomotion for some time, but we will take a quick look at some of elements of the the code here.

def relu(x): return np.maximum(x, 0)

This is just a rectified linear unit (ReLU). This is a common function used when dealing with deep learning networks in place of the traditional Sigmoid function.

class SmallReactivePolicy: "Simple multi-layer perceptron policy, no internal state" def __init__(self, observation_space, action_space): assert weights_dense1_w.shape == (observation_space.shape[0], 256) assert weights_dense2_w.shape == (256, 128) assert weights_final_w.shape == (128, action_space.shape[0]) def act(self, ob): ob[0] += -1.4 + 0.8 x = ob x = relu(np.dot(x, weights_dense1_w) + weights_dense1_b) x = relu(np.dot(x, weights_dense2_w) + weights_dense2_b) x = np.dot(x, weights_final_w) + weights_final_b return x

The __init__ function initializes verifies the shapes for the weights based on the observation and action spaces. These will vary in some of the models so make sure to check these if you choose to create your own observation / action values.

The act function is where the main algorithm is done for the humanoid example. The algorithm utilizes a fully connected neural network and multiples the current observation by a set of pre-tuned weights and returns a new observation at the end.

Now that we know a bit about the algorithm being used, let's see how to run it on the humanoid. To achieve this, we need to let Gym and Roboschool know that we are going to use the "RoboschoolHumanoid-v0" environment. After creating that environment, we pass in the observation and action spaces into the SmallReactivePolicy class.

env = gym.make("RoboschoolHumanoid-v0") pi = SmallReactivePolicy(env.observation_space, env.action_space)

After this, there is a bit of bookkeeping on frames for rendering and starting and stopping the simulation. We won't get into too much of that here but some of the important lines to focus on include the following:

obs = env.reset()

This resets the environment for the humanoid and allows you to run another trial of algorithm. The other two main lines in this function are below:

a = pi.act(obs) obs, r, done, _ = env.step(a)

pi.act(obs) is generating the actions based on the observation seen. These actions are then put into the env.step(a) function where the environment is updated according to the new actions taken. The return values obs, r, done are the observation, reward value, and whether or not the simulation is done running (note we omitted _ on purpose since it is not useful for our needs). These rewards are added to the score to keep track of all rewards for the current run of the algorithm. The rest of the function is used mainly for keeping the simulator updated and running so we will not mention anything further here.